Genome-wide meta-analyses of cross substance use disorders in European, African, and Latino ancestry populations

Genetic risks for substance use disorders (SUDs) are due to both SUD-specific and SUD-shared genes. We performed the largest multivariate analyses to date to search for SUD-shared genes using samples of European (EA), African (AA), and Latino (LA) ancestries. By focusing on variants having cross-SUD and cross-ancestry concordant effects, we identified 45 loci. Through gene-based analyses, gene mapping, and gene prioritization, we identified 250 SUD-shared genes. These genes are highly expressed in amygdala, cortex, hippocampus, hypothalamus, and thalamus, primarily in neuronal cells. Cross-SUD concordant variants explained ~ 50% of the heritability of each SUD in EA. The top 5% individuals having the highest polygenic scores were approximately twice as likely to have SUDs as others in EA and LA. Polygenic scores had higher predictability in females than in males in EA. Using real-world data, we identified five drugs targeting identified SUD-shared genes that may be repurposed to treat SUDs.


Introduction
Substance use disorders (SUDs, including alcohol, cannabis, opioid, and nicotine) have devasting consequences on individuals, their families, and the society.
Globally, approximately 5.5% of disability-adjusted life-years are attributable to SUDs 1 .For each SUD, twin and family studies have shown that genetic factors are responsible for ~ 50% of the variation 2 .Searching for SUD-associated genes will not only help us understand the genetic etiologies of SUDs, but it can also facilitate the development of novel prevention and treatment strategies by identifying drugs targeting genes related to SUDs.
SUDs share common features such as uncontrolled use of substances, withdrawal/negative affect, and compromised executive functioning 3 .Many people use more than one substance and suffer from multiple SUDs simultaneously 4 .This comorbidity among SUDs suggested shared genetic components across different SUDs, as demonstrated by twin studies [5][6][7] and genetic correlation studies 8,9 .Furthermore, recent large-scale genome-wide association studies (GWAS) of individual SUD have identi ed genes associated with more than one SUD, e.g., DRD2 10,11 , FTO 10,[12][13][14] , and PDE4B 10,11,13 .To identify SUD-shared genes, one way is to perform univariate analysis by de ning SUDs cases as having any SUD and de ning controls as not having any SUD.However, individual level genotype data are typically required and thus studies using these methods have small to moderate samples sizes [15][16][17][18][19] , and as a result, their ndings remain to be validated.Multivariate methods such as meta-analysis and genomic structural equation modeling (genomic SEM) 20 can utilize summary statistics from large-scale GWAS and thereby are more powerful.Recently, large-scale studies using genomic SEM to search for SUD-shared genes reported novel associations 8,21 .However, they only explained small portion of genetic variations of SUDs and many SUD-shared genes remain to be discovered 8,21 .
To identify more SUD-shared genes, we performed the largest cross-SUD meta-analyses to date.GWAS summary statistics of problematic alcohol use (PAU) 10,12,22 , cannabis use disorder (CUD) 23 , opioid use disorder (OUD) 24,25 , and nicotine use disorder (NUD) 26,27 were included.We rst performed cross-SUD meta-analyses in European (EA), African (AA), and Latino ancestry (LA) samples.Since many individuals have multiple SUDs simultaneously, it is unlikely that a variant increases the risk for one SUD but decreases the risk for another SUD in the same person.Therefore, we only retained variants that had the same directions of effects in different SUDs (i.e., SUD-concordant variants).We also performed three cross-ancestry meta-analyses in EA and AA (EA_AA), EA and LA (EA_LA), EA, AA, and LA (EA_AA_LA) samples.If a variant is associated with SUDs in different populations, then it likely has the same direction of effect across different populations; therefore, we only retained population-concordant variants in cross-ancestry meta-analyses.These SUDs are correlated and thus we used the xed-effects meta-analysis instead of genomic SEM as it is more powerful [28][29][30] .We performed gene-based analyses using SUDconcordant variants, limited to those genes that are expressed in at least one of the 13 brain regions in GTEx 31 .We used multiple approaches to map variants to genes and prioritize mapped genes.Then for prioritized genes, we performed brain dissection and cell-type enrichment analyses using the single-cell RNA expression data from the BRAIN Initiative Cell Census Network (BICCN) 32 .Heritability estimates and genetic correlation analyses were also performed.We generated polygenic scores and tested their predictabilities for SUDs using data from the All of Us research program and the Indiana Biobank 33 .Lastly, we identi ed drugs targeting prioritized genes then tested whether these drugs could be repurposed to treat SUDs by using real-world data.

Results
Sample description.GWAS summary statistics included in this study are summarized in Table 1.EA GWAS included PAU 10 , CUD 23 , OUD 24,25 , NUD 26,27 and substance abuse from the FinnGen consortium 34 .Diagnosis of each SUD in these GWAS was based on DSM-IV, ICD 9/10 codes, problematic subscale of the Alcohol Use Disorder Identi cation Test, and the Fagerstrom Test for Nicotine Dependence.SUDs cases in FinnGen were de ned as having any SUD, and controls were those without any SUDs and mental disorders.Sample sizes in EA GWAS ranged from 251,534 to 435,563.AA GWAS included alcohol use disorder (AUD) 12,22 , CUD 23 , OUD 24,25 , and NUD 27 .Sample sizes ranged from 9,745 to 91,026.LA GWAS included AUD (N = 14,175) 12 and OUD (N = 34,861) 24 .
For each SUD, if there was more than one GWAS (e.g., EA OUD GWAS), then they were meta-analyzed rst before performing cross-SUD meta-analyses.Sample overlapping across different studies was corrected during meta-analysis.The effective sample sizes after correcting for overlapping samples were 1,467,929 (EA), 159,000 (AA), 45,727 (LA), and 1,672,656 (cross-ancestry total).Using EA samples, the genetic correlations among PAU, OUD, CUD and SUD ranged from 0.48 to 0.70 (P-values ≤ 5.19E-12, Supplemental Table 1), indicating shared genetic underpinning across SUDs.Identi cation of loci associated with SUDs.We de ned independent lead variants as those genome-wide signi cant (GWS) variants with the smallest Pvalues and linkage disequilibrium (LD) r 2 < 0.1 with other GWS variants.We de ned a signi cant locus as a chromosome region surrounding an independent lead variant bordered by variants having LD r 2 > 0.6 with the independent lead variant.If the distance between two loci was < 250 kb, then they were merged.
In cross-SUD meta-analyses, the numbers of independent lead variants identi ed were: 68 (40 loci) in EA, 1 (1 locus) in AA, and 4 (3 loci) in LA (Supplemental Tables 2-4).However, the locus in AA was an intergenic variant without any LD support; therefore, it was likely a false positive and was excluded from further analysis.rs1229984 in ADH1B, which is a well-known alcohol metabolism gene that is associated with AUD 35 , was identi ed in LA.LA had only AUD and OUD GWAS and it is likely that this nding was driven by AUD, but we included this locus in subsequent analyses as we cannot rule out its role in other SUDs.In cross-ancestry meta-analysis, the numbers of independent lead variants identi ed were: 29 (20 loci) in AA_EA, 35 (24 loci) in EA_LA, and 17 (12 loci) in AA_EA_LA (Supplemental Tables 5-7).Manhattan plots of cross-SUD and cross-ancestry meta-analyses are in Supplemental Fig. 1.By merging loci identi ed in all meta-analyses that have inter-loci distances < 250 kb, we identi ed 45 loci in total (Table 2).LocusZoom plots of these 45 loci in different meta-analyses are in Supplemental Fig. 2.Among them, 29 loci were identi ed by multiple meta-analyses (Table 2); and many of them have different independent lead variants in different meta-analyses (Supplemental Fig. 2).There are 104 genes that have at least one GWS variant within its boundaries (transcription start and end sites ± 1 kb, Supplemental Tables 2-7).Seven variants change protein products (missense or stop gain): rs11604671 in ANKK1, rs11601425 in TMPRSS5, rs601338 in FUT2, rs2287922 in RASIP1, rs1229984 in ADH1B, rs3736781 in BTN1A1, and rs1057868 in POR.ANKK1 and TMPRSS5 are ~ 8 kb and ~ 212 kb from DRD2, a well-known gene associated with SUDs.ANKK1 is a part of signal transduction pathway and TMPRSS5 is a member of the serine protease family.It is noteworthy that the missense variant rs11604671 in ANKK1 exhibited P-values < 0.05 in all SUD GWAS in EA.FUT2 and RASIP1 are ~ 50 kb and ~ 25 kb from FGF21, which is related to taste liking measurement and alcohol consumption.FUT2 encodes the alactoside 2-L-fucosyltransferase enzyme and RASIP1 is related to GTPase binding and protein homodimerization activities.BTN1A1 belongs to immunoglobulin superfamily.POR is reported as related to coffee and tea consumption in the GWAS catalog 36 .Additionally, there are 10 protein changing variants in nine genes that have LD r 2 > 0.6 with independent lead variants (Supplemental Table 8; all had P-values ≤ 1.85E-06).Seven of them are in ANKK1, BTN1A1, FUT2, POR, RASIP1, and TMPRSS5 but the other three are in different genes (rs61785974 in PHACTR4, rs6720 in MDH2, rs589292 in SCAI).All these genes are reported in the GWAS catalog as associated with SUD-related or neuropsychiatric-related traits 36 .
Gene-based analysis.MAGMA 37 was used to perform gene-based analysis within each ancestral population, and gene-based results from each ancestry population were meta-analyzed also using MAGMA 37 .After Bonferroni correction, the numbers of signi cant genes identi ed were: 124 in EA, 1 in AA, 2 in LA, 137 in AA_EA, 109 in EA_LA, and 137 in AA_EA_LA (Supplement tables 9-14).In total, 169 unique genes were identi ed by gene-based analysis.
Manhattan plots of gene-based analyses are in Supplemental Fig. 3.
Mapping signi cant variants to genes.For all GWS variants, we used positional mapping (whether a GWS variant is in a gene), eQTL mapping (whether a GWS variant is a cis-eQTL of a gene) and chromatin interaction mapping (whether a GWS variant interacts with a gene) to identify genes impacted by each variant.
Similar to gene-based analysis, we limited to genes that are expressed in at least one brain tissue from GTEx 31 .The numbers of genes identi ed were: 217 in EA, 0 in AA, 6 in LA, 127 in AA_EA, 148 in EA_LA, and 85 in AA_EA_LA (Supplemental Tables 15-19).In total, 244 unique genes were identi ed.
Gene prioritization.The combination of gene-based analysis and gene mapping identi ed a total of 299 genes.However, some of them may be simply close to SUD-associated genes but not SUD related; therefore, we used two strategies to prioritize mapped genes.First, we checked which genes were in any of nine SUD-related pathways (alcoholism, amphetamine addiction, cocaine addiction, morphine addiction, nicotine addiction, dopaminergic synapse, GABAergic synapse, glutamatergic synapse, and MAPK signaling) from the Kyoto Encyclopedia of Genes and Genomics (KEGG: https://www.genome.jp/kegg/);and nine genes are within these pathways.We also prioritized genes in the same gene families as those in SUDs pathways (12 genes), or that directly interact with genes in SUDs pathways (73 genes) using the STRING database 38 .In total, this strategy prioritized 94 genes (Supplemental Table 20).Second, we identi ed 240 genes in the GWAS catalog 36 that are associated with any psychiatric trait, brain measurement, and brain function.Together, these two strategies prioritized 250 genes, which include 220 protein coding genes, 15 non-coding RNA, 10 antisense genes, and 5 pseudogenes, (Supplemental Table 20).Some genes are worth noting.For instance, OPRD1, which was identi ed by positional mapping, is a member of opioid receptor signaling pathway and was nominated as SUD-related in some small studies [39][40][41] , but was not reported as SUD-related in the GWAS catalog 36 .
Brain dissection and cell type enrichment analyses.For the 250 genes prioritized, we investigated in which brain dissections and cell types that they were signi cantly highly or lowly expressed.High-throughput single-cell RNA sequencing data from the Human Brain Cell Atlas v1.0 generated by the BRAIN Initiative cell census network 32 were used.This data sampled 105 dissections from 10 brain regions across the forebrain, midbrain, and hindbrain, and identi ed 461 cell types.Prioritized genes were highly expressed in 35 dissections with 1 from amygdala, 27 from cortex, 2 from hippocampus, 3 from hypothalamus, and 2 from thalamus (Supplemental Table 21).Prioritized genes were lowly expressed in 2 dissections (one from pons and another from medulla, Supplemental Table 22).Prioritized genes were highly expressed in 125 of the 461 cell types, the majority of which are neuronal cells (Supplemental Table 23).Most of these cell types are from amygdala, basal forebrain, cerebral cortex, hippocampus, hypothalamus, and thalamus, in agreement with what were observed in the dissection enrichment analysis.Prioritized genes were lowly expressed in one cell type (splatter) from midbrain (Supplemental Table 24).
Heritability estimation and heritability explained by SUD-concordant variants.Due to the complicated LD patterns and mismatches of LD structures between the AA and LA samples and external LD reference panels, heritability estimation was performed only in EA samples.We identi ed 477,686 SUD-concordant variants in EA and using them, the estimated SNP heritability (h 2 snp ) of SUDs (using LD score regression 42 ) was 0.10 (SE = 0.001).SUD-concordant variants explained 46.6%, 53.0%, 59.1%, and 50.8% of h 2 snp of PAU, OUD, CUD and NUD in EA (Supplemental Table 25).
Polygenic scores (PGS) analyses.We calculated PGS and tested their predictability in samples from the All of Us research program (Version 7.1) and the Indiana Biobank 33 .Since one of the major goals of PGS analyses is to identify high-risk individuals, we compared those top 5% individual with the highest PGS to everyone else.Additionally, since males and females have different prevalence of SUDs, we also performed sex-strati ed analysis.In EA, the top 5% individuals were about two times more likely to have SUDs compared to everyone else in both All of Us (odds ratio (OR) = 2.21, 95% con dence interval (CI): ).In AA, PGS were not signi cant in any analysis (Table 3).Drug repurposing.We queried the Drug Gene Interaction Database (DGIdb, v4.2.0) 43 and identi ed 233 drugs approved by the Food and Drug Administration (FDA, targeting 29 genes; 278 drug-gene pairs).Among them, 93 drugs belong to the nerve system class of the Anatomical Therapeutic Chemical (ATC) Classi cation System (ATC code N, 159 drug-gene pairs, targeting 10 genes: ANKK1, BDNF, CHRM4, DRD2, GABRA4, HTR3B, NCAM1, OPRD1, PDE4B, and POR) (Supplemental Table 27).To identify potential repurposable drugs, we focused on drugs that: 1) have ATC code N, 2) are not approved to treat any SUD, 3) are shown to be related to SUDs treatment through literature search [44][45][46][47][48][49] , and 4) have comparators (i.e., drugs with the same ATC level 4 codes as drugs we identi ed but not targeting any gene we identi ed).Five drugs met these criteria: Topiramate in ATC N03AX (treating epilepsy, targeting GABRA4), Desipramine, Imipramine, and Nortriptyline in ATC N06AA (treating depression and anxiety, targeting BDNF) and Methylphenidate in ATC N06BA (treating ADHD, targeting DRD2).We used a large-scale real-world data from the Clinformatics® (https://www.optum.com/business/life-sciences/commercialanalytics/managed-markets.html;sample sizes 325,542 to 1,817,258; Supplemental Table 28) to investigate whether these ve drugs reduced the risk for SUDs.Compared to users of comparators, the hazard ratios for developing SUDs were: 0.44 for users of Topiramate (95%CI: 0.42-0.47),0.89 (95% CI: 0.84-0.94)for users of Desipramine, Imipramine, or Nortriptyline; and 0.84 (95% CI: 0.78-0.91)for users of Methylphenidate, indicating that these ve drugs signi cantly reduce the risk for SUDs and may be repurposed to treat SUDs.
directions of effects for different SUDs and/or in different populations and they were ltered out by our strategy.Fifth, real-world data-based analysis were subject to unmeasured confounding and inconsistency between health insurance records and true health statues.
As addressed by de Hemptinne and Posthuma, genetic ndings should not be taken out of context, misinterpreted, or misused 51 .In this study, we identi ed more variants in EA than in AA and LA, and the predictability of PGS were higher in EA than in AA and LA.We emphasis that these differences are due to much smaller sample sizes in AA and LA as well as the lack of appropriate statistical methods for admixed populations such as AA and LA.These differences do not suggest that AA and LA have different genetic mechanisms of SUDs.Rather, our results point to the critical and urgent need to increase the sample sizes of under-represented populations to reduce health disparities.We also emphasis that genetic factors or PGS (as an overall measure of effects of genetic factors) are not determination for SUD status.Genetic factors and environment factors such as traumatic life events and external stresses contribute about equally to SUDs.Findings from genetic studies must be interpreted with caution and cannot be used to predict SUD-related outcomes without carefully considering environment factors.Our ndings should never be used to discriminate or to stigmatize people, or to deny access to prevention and treatment programs, especially for those from vulnerable populations.Our ndings should only be used for research purposes to promote health.
In conclusion, using SUD-concordant and population-concordant variants, we have identi ed multiple genes associated with SUDs.These ndings can help us elucidate the mechanisms of SUDs.PGS calculated using SUD-concordant variants had higher predictability in EA and LA.We also identi ed ve drugs targeting genes associated with SUDs that may be repurposed to treat SUDs.Declarations study_id=phs003025.v1.p1.All GWAS summary statistics from this study will be deposited to GWAS catalog: https://www.ebi.ac.uk/gwas/.All variants and their weights used to calculated PGS will be available from PGS catalog: https://www.pgscatalog.org/.

Table 2
Genome-wide signi cant loci and their lead variants.

Table 3
Note: N.case: number of cases; N.control: number of controls.Signi cant P-values are in bold.AOU: All of Us.IB: Indiana Biobank.AA: African ancestry.EA: European ancestry.LA: Latino ancestry.